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DETAILED ACTION 

1 . This office action is in response to application 10/733,995 filed December 1 1 , 
2003. Claims 1-16 are pending in the application and have been examined. 

Information Disclosure Statement 

2. The Information Disclosure Statement filed on December 1 1 , 2003 has been 
accepted and considered in this application. 

Claim Objections 

3. Claim 16 is written as dependent of claim 1 . However, the language of the claim 
would, and the fact that these limitations are already covered by claim 6, suggest that 
this claim should in fact be dependent of claim 13 instead, and will be considered as 
such for purposes of examination. Appropriate correction is required. 

Claim Rejections - 35 USC § 101 

4. 35 U.S.C. 101 reads as follows: 

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of 
matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the 
conditions and requirements of this title. 

5. Claims 7-12 are rejected under 35 U.S.C. 101 because the claimed invention is 
directed to non-statutory subject matter. Claim 7 attempts to claim a machine-readable 
storage. However this could be interpreted to include magnetic carrier waves, 
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considered non-statutory under 35 U.S.C. 101. 
claims 8-12 as being dependent of claim 7. 
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Therefore claim 7 is rejected as well as 



Claim Rejections - 35 USC § 102 

6. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described in (1) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 

7. Claims 1, and 7 are rejected under 35 U.S.C. 102(e) as being anticipated by 
Mahajan et al. (US Patent 7,1 17,153). 

8. Consider claim 1 , Mahajan teaches a method of evaluating the quality of voice 
input recognition by a voice portal (figure 2, shows a method for evaluating recognition 
in a voice system such as figure 1, connected to Wide area Network 173, that could be 
used to access data.), said method comprising the steps of: 

extracting a current grammar (text words in the recognition model to be 
evaluated) from the voice portal (a portion of training text is selected to be spoken 304, 
Figure 3, Column 5 line 11.); 

generating a test input for the current grammar, the test input including a test 
pattern and a set of active grammars for the current grammar (At step 202, a portion of 
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training data 304 is spoken by a person 308 to generate a test signal, in order to test the 
recognition models; Column 5 line 11.); 

providing the test input to the voice portal (voice recognition system software) 
(The acoustic signal is converted into waveforms by receiver 309 and feature extractor 
310, and the feature vectors are provided to a decoder 312; column 5 lines 13-15.); 

analyzing the test pattern with respect to the set of active grammars with a 
speech recognition engine in the voice portal ( At step 204, the predicted sequence of 
speech units is aligned with the actual sequence of speech units from training data 304; 
column 5. line 37.); and 

deriving a measure of quality of recognition for the current grammar (Under one 
embodiment, this objective function is an error function that indicates the degree to 
which the predicted sequence of speech units differs from the actual sequence of 
speech units after the alignment is complete; column 5, lines 44-47.) 

9. Consider claim 7, Mahajan teaches a machine readable storage (figure 1 shows 
memories 141, 151, 152, 155, and 1 56 capable of storing the computer code) having 
stored thereon a computer program for evaluating the quality of voice input recognition 
by a voice portal (figure 2, shows a method for evaluating recognition in a voice system 
such as figure 1 , connected to Wide area Network 173, that could be used to access 
data.), said computer program comprising a routine set of instructions which when 
executed by a machine cause the machine to perform the steps of: 
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extracting a current grammar (text words in the recognition model to be 
evaluated) from the voice portal (a portion of training text is selected to be spoken 304, 
Figure 3, Column 5 line 11.); 

generating a test input for the current grammar, the test input including a test 
pattern and a set of active grammars for the current grammar (At step 202, a portion of 
training data 304 is spoken by a person 308 to generate a test signal; Column 5 line 
11); 

providing the test input to the voice portal (speech recognition system of figure 1 ) 
(The acoustic signal is converted into feature vectors by receiver 309 and feature 
extractor 310, and the feature vectors are provided to a decoder 312; column 5 lines 13- 
15.); 

analyzing the test pattern with respect to the set of active grammars with a 
speech recognition engine in the voice portal ( At step 204, the predicted sequence of 
speech units is aligned with the actual sequence of speech units from training data 304; 
column 5. line 37.); and 

deriving a measure of quality of recognition for the current grammar (Under one 
embodiment, this objective function is an error function that indicates the degree to 
which the predicted sequence of speech units differs from the actual sequence of 
speech units after the alignment is complete; column 5, lines 44-47.) 
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Claim Rejections - 35 USC § 103 

10. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 

obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

1 1 . The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1 , 148 
USPQ 459 (1966), that are applied for establishing a background for determining 
obviousness under 35 U.S.C. 103(a) are summarized as follows: 

1 . Determining the scope and contents of the prior art. 

2. Ascertaining the differences between the prior art and the claims at issue. 

3. Resolving the level of ordinary skill in the pertinent art. 

4. Considering objective evidence present in the application indicating 
obviousness or nonobviousness. 

12. Claims 2 and 8 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Mahajan in view of Reich (US PAP 2002/0173955). 

Consider claim 2, Mahajan teaches the method of claim 1, including comparing 
recognition results for the test input with an expected value to assess the measure of 
quality of recognition (Under one embodiment, this objective function is an error function 
that indicates the degree to which the predicted sequence of speech units differs from 
the actual sequence of speech units after the alignment is complete; column 5, lines 44- 
47.). but does not specifically teach wherein said deriving step includes the step of 
deriving a confidence level and a set of n-best results for the test input, and further 
comprising the steps of: 
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comparing the confidence level and set of n-best results for the test input with an 
expected value to assess the measure of quality of recognition. 

In the same field of speech recognition, Reich teaches the step of deriving a 
confidence level and a set of n-best results for the test input (figure 4 shows step 420, 
confidence scores are determined, step 460 N best candidates are selected. The N-best 
candidate with the highest confidence level would inherently exceed the expected 
value.), 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to use the confidence scores and N best candidates as taught by Reich 
and compare them with an expected value to determine a quality of the recognition as 
taught by Mahajan in order to provide a more robust method of evaluating a recognition 
system by allowing one to spot ambiguities in the recognition. 

Consider claim 8, Mahajan teaches the machine readable storage of claim 7, 
including comparing recognition results for the test input with an expected value to 
assess the measure of quality of recognition (Under one embodiment, this objective 
function is an error function that indicates the degree to which the predicted sequence 
of speech units differs from the actual sequence of speech units after the alignment is 
complete; column 5, lines 44-47 ). but does not specifically teach wherein said deriving 
step includes the step of deriving a confidence level and a set of n-best results for the 
test input, and further comprising the steps of: 
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comparing the confidence level and set of n-best results for the test input with an 
expected value to assess the measure of quality of recognition. 

In the same field of speech recognition, Reich teaches the step of deriving a 
confidence level and a set of n-best results for the test input (figure 4 shows step 420, 
confidence scores are determined, step 460 N best candidates are selected. The N- 
best candidate with the highest confidence level would inherently exceed the expected 
value.), 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to use the confidence scores and N best candidates as taught by Reich 
and compare them with an expected value to determine a quality of the recognition as 
taught by Mahajan in order to provide a more robust method of evaluating a recognition 
system by allowing one to spot ambiguities in the recognition. 

13. Claims 3, 4, 9, and 10 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Mahajan in view of Yuschik (US Patent 7,139,706). 

14. Consider claim 3, Mahajan teaches the method of daim 1 , but does not 
specifically teach the steps of: 

modifying the current grammar to create a modified grammar if the measure of 
quality of recognition for the current grammar deviates from a pre-determined range. 

In the same field of recognition analysis, Yuschik teaches modifying the current 
grammar to create a modified grammar (word list to be used for recognition) if the 
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order to reduce recognition error, step 350 selects alternative words if necessary, 
thereby providing a less confusable alternative to the words available to be recognized; 
column 11 line 34- column 13 line 3). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to modify the grammar in a voice system as taught by Yuschik and 
combine this with analysis of Mahajan in order to improve a speech recognizers 
performance. 

15. Consider claim 4, Mahajan in view of Yuschik suggests the method of claim 3, 
further comprising the steps of: 

(i) generating a test input for the modified grammar, the test input including a test 
pattern and a set of active grammars for the modified grammar; 

(ii) providing the test input for the modified grammar to the voice portal; 

(iii) analyzing the test pattern for the modified grammar with respect to the set of 
active grammars corresponding to the modified grammar with the speech recognition 
engine in the voice portal; 

(iv) deriving a measure of quality of recognition of the modified grammar; and 

(v) re-modifying the modified grammar and repeating steps (i) through (iv) until 
the measure of quality of recognition of the modified grammar does not deviate from a 
pre-determined range. 

This is merely reanalyzing the output of the recognizer after the grammar has 
been updated. Figure 3 of Yuschik shows that the acoustical analysis of 340 is 
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repeated until the acoustical difference is great enough to allow for accurate speech 
recognition. These analysis steps (i-iv) are the same of claim 1 , which can clearly be 
accomplished by the method of Mahajan as discussed above, and acoustical distance 
would obviously effect the result of the analysis. This step would be useful to determine 
the recognizably of any alternative words entered into the grammar by the modifying 
step, thereby insuring that the change increased the performance of the recognizer. 

16. Consider claim 9, Mahajan teaches the machine readable storage of claim 7, but 
does not specifically teach the steps of: 

modifying the current grammar to create a modified grammar if the measure of 
quality of recognition for the current grammar deviates from a pre-determined range. 

In the same field of recognition analysis, Yuschik teaches modifying the current 
grammar to create a modified grammar (word list to be used for recognition) if the 
measure of quality of recognition for the current grammar deviates from a pre- 
determined range (figure 3, step 340 does an acoustic analysis to determine similarity in 
order to reduce recognition error, step 350 selects alternative words if necessary, 
thereby providing a less confusable alternative to the words available to be recognized; » 
column 1 1 line 34- column 1 3 line 3). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to modify the grammar in a voice system as taught by Yuschik and 
combine this with analysis of Mahajan in order to improve a speech recognizers 
performance. 
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17. Consider claim 10, Mahajan in view of Yuschik suggests the computer readable 
storage of claim 9, further comprising the steps of: 

(i) generating a test input for the modified grammar, the test input including a test 
pattern and a set of active grammars for the modified grammar; 

(ii) providing the test input for the modified grammar to the voice portal; 

(iii) analyzing the test pattern for the modified grammar with respect to the set of 
active grammars corresponding to the modified grammar with the speech recognition 
engine in the voice portal; 

(iv) deriving a measure of quality of recognition of the modified grammar; and 

(v) re-modifying the modified grammar and repeating steps (i) through (iv) until 
the measure of quality of recognition of the modified grammar does not deviate from a 
pre-determined range. 

This is merely reanalyzing the output of the recognizer after the grammar has 
been updated. Figure 3 of Yuschik shows that the acoustical analysis of 340 is 
repeated until the acoustical difference is great enough to allow for accurate speech 
recognition. These analysis steps (i-iv) are the same of claim 1 , which can clearly be 
accomplished by the method of Mahajan as discussed above, and acoustical distance 
would obviously effect the result of the analysis. This step would be useful to determine 
the recognizably of any alternative words entered into the grammar by the modifying 
step, thereby insuring that the change increased the performance of the recognizer. 
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18. Claims 5, 6, 11-13, 15, and 16 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Mahajan in view of Randic (US Patent 6,275,797). 

19. Consider claim 5, Mahajan teaches the method of claim 1, but does not 
specifically teach modifying the test pattern to emulate one or more user voices prior to 
entering the test input into the voice portal. 

In the same field of speech testing, Randic suggests modifying the test pattern to 
emulate one or more user voices prior to entering the test input into the voice portal 
(Figure 1 shows using a voice test file generated by a TTS engine used to test the voice 
path using recognition. This is a similar technique used to test the quality of recognition 
in Mahajan. Using a computer generated voice to generate the test file, Column 3 line 
27, would inherently allow the test pattern to emulate whatever voice the computer 
generation system was configured to produce. Further, it is well known in the art that 
TTS engines can be configured to allow for the generation of multiple voice types, 
although the claim language suggest that just one voice could be used.). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to use the computerized speech generation as taught by Randic in 
place of the human speaker as taught by Mahajan in order to allow the speech 
recognizer to become more flexible through the quality analysis. 
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20. Consider claim 6, Mahajan teaches the method of daim 1 , but does not 
specifically teach modifying the test pattern to emulate the influence of one or more 
communications network qualities prior to entering the test input into the voice portal. 

In the same field of speech testing, Randic teaches modifying the test pattern to 
emulate the influence of one or more communications network qualities prior to entering 
the test input into the voice portal (figure 3 shows passing the voiced speech pattern 
through a transmission scheme in order to evaluate the effect that the voice channel 
has on recognition; column 4, line 31- column 7 line 29.). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to combine the analysis of the voice channel as taught by Randic with 
the speech recognition quality evaluation of Mahajan in order to make the speech 
recognizer more robust. 

21 . Consider claim 1 1 , Mahajan teaches the computer readable storage of claim 7, 
but does not specifically teach modifying the test pattern to emulate one or more user 
voices prior to entering the test input into the voice portal. 

In the same field of speech testing, Randic teaches modifying the test pattern to 
emulate one or more user voices prior to entering the test input into the voice portal 
(Figure 1 shows using a voice test file generated by a TTS engine used to test the voice 
path using recognition. This is a similar technique used to test the quality of recognition 
in Mahajan. Using a computer generated voice to generate the test file, Column 3 line 
27, would inherently allow the test pattern to emulate whatever voice the computer 



Application/Control Number: 10/733,995 Page 14 

Art Unit: 2626 

generation system was configured to produce. Further, it is well known in the art that 
TTS engines can be configured to allow for the generation of multiple voice types, 
although the claim language suggest that just one voice could be used.). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to use the computerized speech generation as taught by Randic in 
place of the human speaker as taught by Mahajan in order to allow for more efficient 
and more accurate quality analysis of the recognizer. 

22. Consider claim 12, Mahajan teaches the computer readable storage of claim 7, 
but does not specifically teach modifying the test pattern to emulate the influence of one 
or more communications network qualities prior to entering the test input into the voice 
portal. 

In the same field of speech testing, Randic suggests modifying the test pattern to 
emulate the influence of one or more communications network qualities prior to entering 
the test input into the voice portal (figure 3 shows passing the voiced speech pattern 
through a transmission scheme in order to evaluate the effect that the voice channel 
has on recognition; column 4, line 31- column 7 line 29.). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to combine the analysis of the voice channel as taught by Randic with 
the speech recognition quality evaluation of Mahajan in order to make the speech 
recognizer more robust. 
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23. Consider claim 1 3, Mahajan teaches a system for evaluating the quality of voice 
input recognition by a voice portal having a speech recognition engine (figure 3), 
comprising: 

an analysis interface for extracting a set of current grammars from the voice 
portal a portion of training text is selected to be spoken 304, Figure 3, Column 5 line 

11); 

a test pattern generator for generating a test input for each current grammar, the 
test input including a test pattern and a set of active grammars corresponding to each 
current grammar (At step 202, a portion of training data 304 is spoken by a person 308 
to generate a test signal; Column 5 line 11.);; 

an apparatus for entering each test pattern into the voice portal (At step 202, a 
portion of training data 304 is spoken by a person 308 to generate a test signal; Column 
5 line 11.); 

a results collector for analyzing each test pattern entered into the voice portal 
with the speech recognition engine against the set of active grammars corresponding to 
the current grammar for said test pattern ( At step 204, the predicted sequence of 
speech units is aligned with the actual sequence of speech units from training data 304; 
column 5. line 37.); and 

a results analyzer for deriving a set of statistics of a quality of recognition of each 
current grammar (Under one embodiment, this objective function is an error function 
that indicates the degree to which the predicted sequence of speech units differs from 



Application/Control Number: 10/733,995 Page 16 

Art Unit: 2626 

the actual sequence of speech units after the alignment is complete; column 5, lines 44- 

47.). 

However Mahajan does not specifically teach use a text to speech engine to 
enter data into the voice porthole. 

In the same field of speech signal testing, Randic teaches using a text to speech 
engine to generate test signals for a system (Figure 1 shows using a voice test file 
generated by a TTS engine used to test the voice path using recognition. This is a 
similar technique used to test the quality of recognition in Mahajan. Using a computer 
generated voice to generate the test file, Column 3 line 27, would inherently allow the 
test pattern to emulate whatever voice the computer generation system was configured 
to produce.). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to use the computerized speech generation as taught by Randic in 
place of the human speaker as taught by Mahajan in order to allow for more efficient 
and more comprehensive quality analysis of the recognizer. 

24. Consider claim 1 5, Mahajan in view of Randic teaches the system of claim 1 3, 
but does not specifically teach modifying the test pattern to emulate one or more user 
voices prior to entering the test input into the voice portal. 

However Randic teaches modifying the test pattern to emulate one or more user 
voices prior to entering the test input into the voice portal (Figure 1 shows using a voice 
test file generated by a TTS engine used to test the voice path using recognition. This 
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is a similar technique used to test the quality of recognition in Mahajan. Using a 
computer generated voice to generate the test file, Column 3 line 27, would inherently 
allow the test pattern to emulate whatever voice the computer generation system was 
configured to produce. Further, it is well known in the art that TTS engines can be 
configured to allow for the generation of multiple voice types, although the claim 
language suggest that just one voice could be used.). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to use the computerized speech generation as taught by Randic to 
emulate a user voice in order to allow for more efficient and more accurate quality 
analysis of the recognizer. 

25. Consider claim 16, Mahajan teaches the system of claim 1 3, wherein the test 
pattern generator is modified to emulate the influence of one or more communications 
network qualities prior to entering the test input into the voice portal, (figure 3 shows 
passing the voiced speech pattern through a transmission scheme in order to evaluate 
the effect that the voice channel has on recognition; column 4, line 31- column 7 line 
29.). 

26. Claim 14 is rejected under 35 U.S.C. 103(a) as being unpatentable over Mahajan 
in view of Randic as applied to claim 1 3 above, and further in view of Reich. 
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27. Consider claim 14, Mahajan in view of Randic teaches the system of claim 13, 
including comparing recognition results for the test input with an expected value to 
assess the measure of quality of recognition for each current grammar (Under one 
embodiment, this objective function is an error function that indicates the degree to 
which the predicted sequence of speech units differs from the actual sequence of 
speech units after the alignment is complete; column 5, lines 44-47.). but does not 
specifically teach that the statistics include a confidence level and a set of n-best results 
for each test input. 

In the same field of speech recognition, Reich teaches the step of deriving a 
confidence level and a set of n-best results for the test input (figure 4 shows step 420, 
confidence scores are determined, step 460 N best candidates are selected.), 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to use the confidence scores and N best candidates as taught by Reich 
and compare them with an expected value to determine a quality of the recognition as 
taught by Mahajan in view of Randic in order to provide a more robust method of 
evaluating a recognition system by allowing one to spot ambiguities in the recognition. 



Conclusion 



28. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure is listed in the Notice of References Cited. 



Application/Control Number: 10/733,995 Page 19 

Art Unit: 2626 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Douglas C. Godbold whose telephone number is (571) 
270-1451. The examiner can normally be reached on Monday-Thursday 7:00am- 
4:30pm Friday 7:00am-3:30pm. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Patrick Edouard can be reached on (571) 272-7603. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 




